This assignment is for ETC5521 Assignment 1 by Team Quokka comprising of Dea Avega Editya and Siyi Li.

Introduction and motivation

We are motivated to explore history of slavery in the United States of America (USA) and how this dark moment changed over time of hundred years. Through the data that sourced from tidytuesday github (https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-06-16/readme.md), we will look at how the situation at the moment was and what can we learn from the history that maybe has correlation with current racist behavior toward black people especially in the USA.

This report has several limitations: 1. Data sets used only contain relatively small observations, i.e. slaves_name record only covers numbers of slaves saved during their expedition. Hence, it may not capture real situation during slavery history. 2. Census data only capture US demographics from 1790 to 1870 which is quite short regarding long existence of slavery prior to the census period. In addition, West region only has census data of 1850 and 1860. 3. Some proportion of data has N/A value and errors which maybe omitted during data exploration.

Data description

Our data sets are retrieved from github repository of tidytuesday project, which has original source from US Census’s Archives, Slave Voyages, and Black Past.

There are four data sets in the tidytuesday’s repo, however for this report’s purpose we only use three data sets which are:

  1. Census (in csv format) The data set record the total slave populations across the USA during the slavery era and has 8 variables (region, division, year, total, white, black, black_free and black_slave) and 102 observation. The data is collected from a historical US census data with time period from 1790 to 1870.

  2. African_names (in csv format) The data set has 11 variables (id, voyage id , name, gender, age, height, ship name, year arrival, port embark, and country origin) and 91.490 observations. The data is collected from liberated slaves by recording their names and ages. The record is from 1808 to 1862.

  3. Blackpast (in csv format) The data set covers details around some events related to African-Americans history during slavery era until post-slavery including violence and racism events and celebrations of achievements. It has 6 variables (year, event, subject, country, state and era) and 896 observations. The data is compiled by blackpast organization (blackpast.org) from 1492 to 2009.

The wrangling process is conducted by grouping some variables in the african_names to have aggregate number of each category hence enable us to compare across categories. We also recalculate number of total population in census data since the existing number is incorrect for some region i.e. West region (we find this miscalculation after visualizing the data). These proportions will be used to track slavery exploitation in the USA.

Furthermore, this report also takes advantage of quite comprehensive record of African-Americans’ events in the blackpast data set, to analyze which region of USA that seems to be unfriendly to the African-American people related to the negative sentiments observed.

Therefore, using all of these mentioned datasets this report may find some brief explanations on a main question: Does the long slavery history in the USA explain current racism towards African-American?

In order to answer the main question, we will first look at these secondary questions: 1. What is the demographic of black slaves? 2. Which region of the USA that had mostly exploited the practice? 3. Which region is the most unfriendly for black people?

References of data sets: 1. Tidytuesday (https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-06-16/readme.md), 2. Blackpast (https://www.blackpast.org/african-american-history-timeline/) 3. US census data (https://www.census.gov/content/dam/Census/library/working-papers/2002/demo/POP-twps0056.pdf)

Analysis and findings

Demographic of Black Slaves

Maturity of Slaves

Maturity of Slaves

## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

Composition Based on Gender

Age Distribution

In this section, we will extract some information from african_names dataset related to demographic of slaves. By extracting and analyzing the data, we can have initial picture of the slavery practice before moving on to further analysis in the following sections. We group the category of slaves by id (since we are not interested to know their names and sometimes people can have similar names), gender and age.

Thus, we can get information about gender category from a pie chart of figure @ref(fig:demo1). As can be seen from the plot, that men occupy the largest percentage of total slaves in the observed data, followed by boys. Proportion of boys is also larger than women. Meanwhile, girls contribute to the smallest percentage of slaves. In general, we can see that men and boys’ proportion are significantly larger than female (women and girls). However, we realize that if we look at children proportion (boys and girls) as shown in figure @ref(figure:demo), it is really sad to know that they make up almost a half of the total slaves.

When we look at the age distribution of slaves as shown in figure @ref(fig:demo2), majority of the slaves are between 15 and 35 years old. This range of productive ages is unsurprising because they were brought into the country mainly to be a worker, according to prior information we got from blackpast.org.

Though, there are some outlier that we can spot from the figure @ref(fig:demo2) which we suspect due to errors in data recording i.e. a boy has age of 40 and a man is 77 years old (which we think way too old to be a slave). Another finding is there are some slaves that just age 5 months old as well as some children under 5 years old, which are absolutely pretty young. One possible reason is that their parents are among the slaves that brought along their children.

The most exploiting region in the USA

Composition of White and Black

Composition of White and Black

For answering the question of which region that most exploits the slavery practice, we will not look at the number of black slaves in each region of the USA. Instead, we will see the trend proportion of these three categories (white, black slaves, and black free), as seen in figure @ref(fig:compare-region).

Seeing the plot, we can easily spot that South region has the most contrast pattern which distinguish the region from the others. The proportion of white people in South region slightly decrease from 1800 until 1860, however the proportion of black slaves is growing and comprises more than a quarter of the total population in that region.

Proportion of White and Black

Proportion of White and Black

After spotting the pattern, we are interested to explore further in the South region to see the slavery practice in division level. South region consists of three divisions, South Atlantic, East South Central and West South Central. From figure @ref(fig:south-all) we can see clearly that East South Central division has more progressive pattern of exploitation during the observed period, where other divisions tend to have more stable pattern of slavery practice. However, all of these divisions have quite similar proportion of black slaves in the last census year of slavery period (1860).

The most unfriendly region for African-Americans

In this section, we will connect the past and post-slavery period in the USA through some observed events from blackpast data set. For the purpose, we will first filter country of interest in the data set to be only the USA. Later on, we will select only important words (filtered using English stop words of tidytext R package (Silge and Robinson (2016))). Having all essential words, we will try to find various sentiments in these words by using NRC sentiment of Mohammad (2018) and grouping it by region.

In order to link this analysis with the previous section, we add variable of regions that correspond to the state names in the blackpast data set. For example, we add Northeast for covering states like Connecticut, New York and New Jersey. In addition we also focus on all events that related to slavery and racism behavior toward African-American, by only selecting relevant subject such as “Slave Laws”, “Slave Labor”, “Racial Restrictions”, “Racial Violence”,“Resistance to Enslavement”, “The Slavery Controversy”, and “Antebellum Slavery”).

Sentiment Analysis in Regions

Sentiment Analysis in Regions

Negative Nuances

Negative Nuances

Using these method, we can observe occurences of bad event for African-American in the data set which mostly comprises of racial restrictions and racial violence. According to the analysis in figure @ref(fig:sentiment_region), most nuance of observed events are filled with negative, fear and anger. Moreover, most events recorded were occurred in South region. It is not surprising given the previous finding that South region is the region that seems really exploiting slavery practice.

We then focus on particular bad events which represented by sentiment of “disgust”, “fear”, “negative” and “sadness” and plot it into bar chart as seen in figure @ref(fig:sentiment_negative). Using these sentiments, we can see that South and Northeast region are seems like “unfriendly” to the African-American people since both regions contribute to most of these bad events in the observed data. On the other hand, West region maybe a good place to live for African-American, due to very few bad incidents happened there.

Summary

The history of slavery has deeply rooted in some parts of the USA, and the practice is mostly exploited in the South region. The slaves were brought into the country mainly for being a worker, therefore most slaves were in their productive ages, ranges from 15 to 35 years old.

According to sentiment analysis from blackpast record of events, we can see that most bad events took place in South region followed by Northeast region. These bad events are related to slavery and racism behavior (post-slavery) toward Africa-American.

This finding is quite intriguing since it is inadvertently backed by a journal research from Chae et al. (2015) that describes these two regions as the most racist regions in the United States.

Bibliography

Arnold, Jeffrey B. 2019. Ggthemes: Extra Themes, Scales and Geoms for ’Ggplot2’. https://CRAN.R-project.org/package=ggthemes.

Auguie, Baptiste. 2017. GridExtra: Miscellaneous Functions for "Grid" Graphics. https://CRAN.R-project.org/package=gridExtra.

Chae, David H., Sean Clouston, Mark L. Hatzenbuehler, Michael R. Kramer, Hannah L. F. Cooper, Sacoby M. Wilson, Seth I. Stephens-Davidowitz, Robert S. Gold, and Bruce G. Link. 2015. “Association between an Internet-Based Measure of Area Racism and Black Mortality.” Edited by Hajo Zeeb. PLOS ONE 10 (4): e0122963. https://doi.org/10.1371/journal.pone.0122963.

Firke, Sam. 2020. Janitor: Simple Tools for Examining and Cleaning Dirty Data. https://CRAN.R-project.org/package=janitor.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.

Kassambara, Alboukadel. 2020. Ggpubr: ’Ggplot2’ Based Publication Ready Plots. https://CRAN.R-project.org/package=ggpubr.

Mohammad, Saif M. 2018. Word Affect Intensities. Miyazaki, Japan.

Sievert, Carson. 2020. Interactive Web-Based Data Visualization with R, Plotly, and Shiny. Chapman; Hall/CRC. https://plotly-r.com.

Silge, Julia, and David Robinson. 2016. “Tidytext: Text Mining and Analysis Using Tidy Data Principles in R.” JOSS 1 (3). https://doi.org/10.21105/joss.00037.

Tierney, Nicholas, Di Cook, Miles McBain, and Colin Fay. 2020. Naniar: Data Structures, Summaries, and Visualisations for Missing Data. https://CRAN.R-project.org/package=naniar.

Wickham, Hadley. 2007. “Reshaping Data with the reshape Package.” Journal of Statistical Software 21 (12): 1–20. http://www.jstatsoft.org/v21/i12/.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.

Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2020. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

Wickham, Hadley, Jim Hester, and Romain Francois. 2018. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.